Table of Contents
- 1. Probability
- 2. Probability Space
- 3. Random Variable
- 4. Probability Distribution
- 4.1. Properties
- 4.2. Probability Mass Function
- 4.3. Probability Density Function
- 4.4. Cumulative Probability Function
- 4.5. Normalization and Denormalization
- 4.6. Combination
- 4.7. Instances
- 5. Parametric Family
- 6. Stochastic Process
- 7. Filtration
- 8. References
- Probability Calculus
1. Probability
1.1. Marginal Probability
- Probability distribution of a subset of a larger collection of random variables.
1.2. Conditional Probability
- Probability contingent upon the values of the other variables.
- \[ p_{Y|X}(y\mid x) := \mathrm{P}[Y=y\mid X=x] = \frac{\mathrm{P}(\{X=x\}\cap \{Y=y\})}{\mathrm{P}(\{X=x\}} \]
- \[ f_{Y|X}(y\mid x) = \frac{f_{X,Y}(x,y)}{f_X(x)} \]
1.3. Law of Total Probability
- Relation between marginal probability and conditional probability.
- \[ \mathrm{P}(A) = \sum_k \mathrm{P}(A\cap B_k) = \sum_k \mathrm{P}(A\mid B_k)\mathrm{P}(B_k) \]
- \[ \mathrm{P}(A) = \int_\mathrm{R}\mathrm{P}(A\mid X=x)\,dF_X(x) \]
- Further, \[ \mathrm{P}(A\mid B) = \sum_n \mathrm{P}(A \mid C_n)\,\mathrm{P}(C_n\mid B) \]
1.4. Bayesian Probability
- Interpretation of the probability as reasonable expectation, instead of frequency or propensity.
- It represents a state of knowledge or as quantification of a personal belief.
2. Probability Space
2.1. Definition
- A ../calculus/Measure Theory.html#org4a45981 such that the measure of the whole space is one.
- It is a triple \((\Omega, \Sigma, \mathrm{P})\) consists of
- The sample space \(\Omega\): an arbitrary non-empty set,
- The event space \(\Sigma\): ../calculus/Measure Theory.html#org39f4f7f on \(\Omega\),
- \(\Sigma\) for sigma algebra. \(\mathcal{F}\) is also often used instead, for filtration. or \(\mathcal{A}\) by conventions.
- The probability measure \(\mathrm{P}: \mathcal{F} \to [0, 1]\).
2.2. Probability Measure
2.2.1. Definition
- Probability measure \(\mathrm{P}\) is a ../calculus/Measure Theory.html#org39ea87f
over \(\Omega\), with:
- \(\mathrm{P}: \sigma(\Omega) \to [0, 1]\), with \(\mathrm{P}(\varnothing) = 0\) and \(\mathrm{P}(\Omega) = 1\).
- Countable Additivity: \[ \mathrm{P}\left(\bigcup_{i\in \mathbb{N}}E_i\right) = \sum_{i\in \mathbb{N}}\mathrm{P}({E_i}) \] where \(\{E_i\}\) are pairwise disjoint sets.
- The validity of this definition of a probability measure is
precisely given by the Kolmogorov axioms.1
- Non-negativity
- Unit measure
- σ-additivity
2.2.2. Notations
- Probability that a random variable \(X\) takes a value in a measurable set \(S\subseteq E\) is written as \[ \mathrm{P}[X\in S] := \mathrm{P}(\{\omega \in \Omega\mid X(\omega)\in S\}). \]
3. Random Variable
- Kolmogorov Definition
- Random variable is a ../calculus/Measure Theory.html#org90515ad from the sample space to a ../calculus/Measure Theory.html#org4a45981 \(E\), often taken to be \(\mathbb{R}\): \[ X: \Omega \to E. \]
4. Probability Distribution
- Probability distribution forgets the probability space, and only remembers the output values of a random variable.
4.1. Properties
4.1.1. Mean
4.1.2. Variation
4.1.3. Skewness
- 왜도
4.1.4. Kurtosis
- 첨도
4.1.5. Absolutely Continuous
- Probability function whose domain has infinite elements.
- Random variable \(X\) is absolutely continuous, if there exists a function \(f_X\) such that for each interval \([a,b] \in \mathbb{R}\): \[ \mathrm{P}[a\le X \le b] = \int_a^b f_X(x)\,dx \]
4.2. Probability Mass Function
- The probability mass function \(p_X(x)\) is defined as \[ p_X(x) := \mathrm{P}[X=x]. \]
4.3. Probability Density Function
4.3.1. Definition
- The probability density function of a random variable \(X\) that takes values in a measure space \((E, \Sigma, \mu)\) is the ../calculus/Measure Theory.html#org073f99b of the ../calculus/Differential Geometry.html#Pushforward Measure \(X_*\mathrm{P}\) with respect to the reference measure \(\mu\): \[ f_X(x) := \frac{d X_*\mathrm{P}}{d\mu} \]
- That is, \[ X_*\mathrm{P}(A) = \int_A f_X\,d\mu \] for any measurable set \(A\in \Sigma\).
4.3.2. Properties
- For a real-valued random variable, it is absolutely continuous
univariate distribution satisfying:
- \[ f_X(x) = \frac{dF_X}{dx}, \]
4.4. Cumulative Probability Function
- The cumulative probability function of a real-valued random variable is \[ F_X(x) := \mathrm{P}[X\le x]. \]
4.5. Normalization and Denormalization
- The random variable \(X\) is contravariant, and the underlying
probability function is covariant:
- \[ Z = \frac{X - \mu}{\sigma} \]
- \[
f_Z(x) = \sigma f_X(\sigma x + \mu)
\]
- The \(\sigma\) factor is the normalization constant, to compensate for \(f_X\) being scaled down in the \(x\) direction by the factor of \(\sigma\).
- It is the inverse of the normalization:
- \[ X = \sigma Z + \mu \]
- \[ f_X(x) = \frac{1}{\sigma}f_Z\left(\frac{x - \mu}{\sigma}\right) \]
- The random variable and probability distribution transforms oppositely.
4.6. Combination
4.6.1. Distribution of Sum
- \[ f_{X+Y}(x) = (f_X * f_Y)(x) \]
- where \(*\) is the .
- List of convolutions of probability distributions - Wikipedia
4.6.2. Product Distribution
- \[ f_{XY} = \int_{-\infty}^\infty f_{X,Y}(x, z/x)\frac{1}{|x|}\,dx \]
- Distribution of the product of two random variables - Wikipedia
4.6.3. Ratio Distribution
- \[ f_{X/Y} = \int_{-\infty}^\infty f_{X,Y}(zy, y)|y|\,dy \]
- Ratio distribution - Wikipedia
4.7. Instances
- The unexpected probability result confusing everyone - YouTube
- For independent uniformly distributed random variables \(X\) and \(Y\), the probability distribution of the \(\max(X,Y)\) is equal to the probability distribution of \(\sqrt{X}\).
- Similarly, the probability distribution of the \(\max(X_1, X_2,\dots, X_n)\) is equal to the probability distribution of \(\sqrt[n]{X_1}\)
- Shockingly, the probability distribution of the \(XY^Z\) is uniform, for the independent uniformly distributed random variables \(X,Y,Z\).
4.7.1. Discrete Distributions
4.7.1.1. Bernoulli Distribution
4.7.1.2. Binomial Distribution
4.7.1.3. Multinomial Distribution
- Higher dimensional binomial distribution.
- \[
f(x_1,\dots,x_k;n,p_1,\dots,p_k) = \binom{n}{x_1,\dots,x_k}\prod_{i=1}^k p_i^{x_i}
\]
- using the ../Number Theory.html#orgc651707.
- Multinomial distribution - Wikipedia
4.7.1.4. Poisson Distribution
- \[ f(k;\lambda) = \frac{\lambda^ke^{-\lambda}}{k!} \]
- https://en.wikipedia.org/wiki/Poisson_distribution
4.7.1.5. Geometric Distribution
4.7.2. Continuous Distributions
4.7.2.1. Normal Distribution
- Gaussian Distribution
- \(\mathcal{N}(\mu, \sigma^2)\)
4.7.2.1.1. Probability Density Function
- \[ f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]
- Normalizing the area of the Gaussian function:
\[
e^{-x^2} \rightsquigarrow \frac{1}{\sqrt{\pi}}e^{-x^2}
\]
- This was the definition of the standard normal by Carl Friedrich Gauss
- It has standard deviation of \(1/\sqrt{2}\).
- Denormalizing the probability distribution to mean \(\mu\) and standard deviation \(\sigma\): \[ \frac{1}{\sqrt{\pi}}e^{-x^2} \rightsquigarrow \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}. \]
4.7.2.2. Chi-Squared Distribution
4.7.2.2.1. Definition
- For independent, standard normal random variables \(Z_1, \dots, Z_k\), \[ Q = \sum_{i=1}^k Z_i^2 \] is distributed according to the chi-squared distribution with \(k\) degrees of freedom: \[ Q \sim \chi^2(k). \]
- This distribution arises in least square method.
- Chi-squared distribution - Wikipedia
4.7.2.3. F-Distribution
- ((66cc6b5d-2778-4bfb-9840-7a4d91f147df))
4.7.2.4. Student's T-Distribution
- T-Distribution
- The name is from William Sealy Gossett.
- It has fat tails. The shape of the t-distribution approaches the standard 4.7.2.1, as the sample size increases.
- It is a parametric family \(t_{\rm DF}\) with respect to the degrees of freedom which is directly related to the sample size.
4.7.2.5. Cauchy Distribution
- Lorentz Distribution, Cauchy-Lorentz Distribution, Lorentzian Function, Breit-Wigner Distribution
- \[ f(x; x_0, \gamma) = \frac{1}{\pi\gamma\left[1+\left(\frac{x-x_0}{\gamma}\right)^2\right]}. \]
- Its mean is undefined.
4.7.2.6. Exponential Distribution
- Negative Exponential Distribution
- In terms of rate \(\lambda\):
- \[ f(x;\lambda) = \lambda e^{-\lambda x} \]
- with \(f(x;\lambda) = 0\) if \(x<0\).
- In terms of scale parameter \(\beta = 1/\lambda\):
- \[ f(x;\beta) = \frac1\beta e^{-x/\beta} \]
- The distance between consecutive events in a Poisson point process.
- Continuous Geometric Distribution
4.7.2.7. Beta Distribution
4.7.2.7.1. Defintion
- \[
f(x; \alpha, \beta) := \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\mathrm{B}(\alpha, \beta)}
\]
- where \(\mathrm{B}(\alpha,\beta)\) is the .
4.7.2.7.2. Properties
- It is the probability distribution of the estimator \(\hat{p}\) of
the probability of observing a positive event after observing
\(\alpha-1\) positive events and \(\beta-1\) negative events.
- \[ \hat{p} = \frac{\alpha-1}{\alpha + \beta-2} \sim \mathcal{Be}(\alpha, \beta) \]
- \[ \mathcal{Be}(\alpha, \beta) = \mathrm{P}[X_{\alpha+\beta-1} = \omega_+ \mid X_{\sigma(1)} = \cdots = X_{\sigma(\alpha-1)} = \omega_+, X_{\sigma(\alpha)} =\cdots = X_{\sigma(\alpha+\beta - 2)} = \omega_-] \]
- where \(X_i\)s are independent and identically distributed (iid) random variables, with unknown probability distribution.
- The ./Statistics.html#org3f99914 is symmetric:
\[
H_X = H_{1-X}
\]
- where \(H_X\) is defined to be \[ H_X := \frac{1}{\mathrm{E}\left[\frac{1}{X}\right]} \]
- Concentration \(\kappa := \alpha+\beta\)
- mode \[ \omega = \frac{\alpha-1}{\alpha+\beta -2} \]
- Variance
\[
\sigma^2 = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha+\beta +1)}
\]
- It is asymptotically equal to the variance of the sample mean \(\bar{x}\) of random variables distributed as 4.7.1.1. \[ \frac{\hat{p}\hat{q}}{n} = \widehat{\sigma^2[\hat{p}]}, \quad\sigma^2[\bar{x}]= \sigma^2[\hat{p}] \]
4.7.2.7.3. Beta Prime Distribution
- The probability distribution of the estimator of odds.
5. Parametric Family
- Explaining Parametric Families - YouTube
- A parametric family is a statistical model of an unknown probability distribution, which one can do analysis on.
6. Stochastic Process
6.1. Properties
- A stochastic process \( (X_t)_{t\in \mathbb{T} \) on the probability space \( (\Omega, \Sigma, \mathrm{P}) \)
generates the natural filtration \( (\mathcal{F}_t)_{t\in \mathbb{T}} \) of \( \Sigma \):
\[
\mathcal{F}_t := \sigma(X_k \mid k \le t).
\]
- The probability space is the product space of all domains of \( (X_t)_{t\in\mathbb{T}} \).
7. Filtration
7.1. Defintion
- Filtration of a structure \(S\) is the totally ordered collection of substructures.
- \((\mathcal{F}_i)_{i\in I}\) such that \(i\le j\implies \mathcal{F}_i \subseteq \mathcal{F}_j \subseteq S\).
8. References
- Marginal distribution - Wikipedia
- Conditional probability distribution - Wikipedia
- Bayesian probability - Wikipedia
- Probability space - Wikipedia
- Random variable - Wikipedia
- Probability distribution - Wikipedia
- Probability distribution - Wikipedia
- Cauchy distribution - Wikipedia
- Exponential distribution - Wikipedia
- Beta distribution - Wikipedia
- Beta prime distribution - Wikipedia
- The Beta Distribution : Data Science Basics - YouTube
- Poisson point process - Wikipedia
- Filtration (probability theory) - Wikipedia
- Filtration (mathematics) - Wikipedia